Value Iteration is Optic Composition

نویسندگان

چکیده

Dynamic programming is a class of algorithms used to compute optimal control policies for Markov decision processes. ubiquitous in theory, and also the foundation reinforcement learning. In this paper, we show that value improvement, one main steps dynamic programming, can be naturally seen as composition category optics, intuitively, function limit chain optic compositions. We illustrate with three classic examples: gridworld, inverted pendulum savings problem. This first step towards complete account learning terms parametrised optics.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Factored Value Iteration Converges

In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the least-squares projection operator is modified so that it does not increase max-norm, and thus preserves convergence. The other modification is that we un...

متن کامل

Value Pursuit Iteration

Value Pursuit Iteration (VPI) is an approximate value iteration algorithm that finds a close to optimal policy for reinforcement learning problems with large state spaces. VPI has two main features: First, it is a nonparametric algorithm that finds a good sparse approximation of the optimal value function given a dictionary of features. The algorithm is almost insensitive to the number of irrel...

متن کامل

External Memory Value Iteration

We propose a unified approach to disk-based search for deterministic, non-deterministic, and probabilistic (MDP) settings. We provide the design of an external Value Iteration algorithm that performs at most O(lG · scan(|E|) + tmax · sort(|E|)) I/Os, where lG is the length of the largest back-edge in the breadth-first search graph G having |E| edges, tmax is the maximum number of iterations, an...

متن کامل

Value Iteration Networks

We introduce the value iteration network (VIN): a fully differentiable neural network with a ‘planning module’ embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented a...

متن کامل

Focused Topological Value Iteration

Topological value iteration (TVI) is an effective algorithm for solving Markov decision processes (MDPs) optimally, which 1) divides an MDP into strongly-connected components, and 2) solves these components sequentially. Yet, TVI’s usefulness tends to degrade if an MDP has large components, because the cost of the division process isn’t offset by gains during solution. This paper presents a new...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronic proceedings in theoretical computer science

سال: 2023

ISSN: ['2075-2180']

DOI: https://doi.org/10.4204/eptcs.380.24